Data Model
   HOME

TheInfoList



OR:

A data model is an
abstract model A conceptual model is a representation of a system. It consists of concepts used to help people know, understand, or simulate a subject the model represents. In contrast, physical models are physical object such as a toy model that may be assemb ...
that organizes elements of
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
and standardizes how they relate to one another and to the properties of real-world
entities An entity is something that exists as itself, as a subject or as an object, actually or potentially, concretely or abstractly, physically or not. It need not be of material existence. In particular, abstractions and legal fictions are usually ...
. For instance, a data model may specify that the data element representing a car be composed of a number of other elements which, in turn, represent the color and size of the car and define its owner. The term data model can refer to two distinct but closely related concepts. Sometimes it refers to an abstract formalization of the
objects Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an ...
and relationships found in a particular application domain: for example the customers, products, and orders found in a manufacturing organization. At other times it refers to the set of concepts used in defining such formalizations: for example concepts such as entities, attributes, relations, or tables. So the "data model" of a banking application may be defined using the entity-relationship "data model". This article uses the term in both senses. A data model explicitly determines the structure of data. Data models are typically specified by a data specialist, data librarian, or a digital humanities scholar in a data modeling notation. These notations are often represented in graphical form. Michael R. McCaleb (1999)
"A Conceptual Data Model of Datum Systems"
. National Institute of Standards and Technology. August 1999.
A data model can sometimes be referred to as a
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
, especially in the context of
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s. Data models are often complemented by
function model In systems engineering, software engineering, and computer science, a function model or functional model is a structured representation of the functions ( activities, actions, processes, operations) within the modeled system or subject area.
s, especially in the context of
enterprise model Enterprise modelling is the abstract representation, description and definition of the structure, processes, information and resources of an identifiable business, government body, or other large organization. It deals with the process of underst ...
s.


Overview

Managing large quantities of structured and
unstructured data Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, num ...
is a primary function of
information system An information system (IS) is a formal, sociotechnical, organizational system designed to collect, process, store, and distribute information. From a sociotechnical perspective, information systems are composed by four components: task, people ...
s. Data models describe the structure, manipulation, and integrity aspects of the data stored in data management systems such as relational databases. They may also describe data with a looser structure, such as
word processing A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no conse ...
documents, email messages, pictures, digital audio, and video: XDM, for example, provides a data model for
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
documents.


The role of data models

The main aim of data models is to support the development of
information system An information system (IS) is a formal, sociotechnical, organizational system designed to collect, process, store, and distribute information. From a sociotechnical perspective, information systems are composed by four components: task, people ...
s by providing the definition and format of data. According to West and Fowler (1999) "if this is done consistently across systems then compatibility of data can be achieved. If the same data structures are used to store and access data then different applications can share data. The results of this are indicated above. However, systems and interfaces often cost more than they should, to build, operate, and maintain. They may also constrain the business rather than support it. A major cause is that the quality of the data models implemented in systems and interfaces is poor". * "Business rules, specific to how things are done in a particular place, are often fixed in the structure of a data model. This means that small changes in the way business is conducted lead to large changes in computer systems and interfaces". * "Entity types are often not identified, or incorrectly identified. This can lead to replication of data, data structure, and functionality, together with the attendant costs of that duplication in development and maintenance". * "Data models for different systems are arbitrarily different. The result of this is that complex interfaces are required between systems that share data. These interfaces can account for between 25-70% of the cost of current systems". * "Data cannot be shared electronically with customers and suppliers, because the structure and meaning of data has not been standardized. For example, engineering design data and drawings for process plant are still sometimes exchanged on paper". The reason for these problems is a lack of standards that will ensure that data models will both meet business needs and be consistent. A data model explicitly determines the structure of data. Typical applications of data models include database models, design of information systems, and enabling exchange of data. Usually, data models are specified in a data modeling language.


Three perspectives

A data model ''instance'' may be one of three kinds according to
ANSI The American National Standards Institute (ANSI ) is a private non-profit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The organi ...
in 1975: #
Conceptual data model A conceptual schema is a high-level description of informational needs underlying the design of a database. It typically includes only the main concepts and the main relationships among them. Typically this is a first-cut model, with insufficient ...
: describes the semantics of a domain, being the scope of the model. For example, it may be a model of the interest area of an organization or industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationship assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial 'language' with a scope that is limited by the scope of the model. #
Logical data model A logical data model or logical schema is a data model of a specific problem domain expressed independently of a particular database management product or storage technology (physical data model) but in terms of data structures such as relational ta ...
: describes the semantics, as represented by a particular data manipulation technology. This consists of descriptions of tables and columns, object oriented classes, and XML tags, among other things. #
Physical data model A physical data model (or database design) is a representation of a data design as implemented, or intended to be implemented, in a database management system. In the lifecycle of a project it typically derives from a logical data model, thou ...
: describes the physical means by which data are stored. This is concerned with partitions, CPUs, tablespaces, and the like. The significance of this approach, according to ANSI, is that it allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual model. The table/column structure can change without (necessarily) affecting the conceptual model. In each case, of course, the structures must remain consistent with the other model. The table/column structure may be different from a direct translation of the entity classes and attributes, but it must ultimately carry out the objectives of the conceptual entity class structure. Early phases of many software development projects emphasize the design of a
conceptual data model A conceptual schema is a high-level description of informational needs underlying the design of a database. It typically includes only the main concepts and the main relationships among them. Typically this is a first-cut model, with insufficient ...
. Such a design can be detailed into a
logical data model A logical data model or logical schema is a data model of a specific problem domain expressed independently of a particular database management product or storage technology (physical data model) but in terms of data structures such as relational ta ...
. In later stages, this model may be translated into
physical data model A physical data model (or database design) is a representation of a data design as implemented, or intended to be implemented, in a database management system. In the lifecycle of a project it typically derives from a logical data model, thou ...
. However, it is also possible to implement a conceptual model directly.


History

One of the earliest pioneering works in modeling information systems was done by Young and Kent (1958), Janis A. Bubenko jr (2007) "From Information Algebra to Enterprise Modelling and Ontologies - a Historical Perspective on Modelling for Information Systems". In: ''Conceptual Modelling in Information Systems Engineering''.
John Krogstie John Krogstie (born 23 May 1967) is a Norwegian computer scientist, professor in information systems at the Norwegian University of Science and Technology (NTNU) in Trondheim, Norway, and an expert in the field of enterprise modelling. Biograp ...
et al. eds. pp 1-18
who argued for "a precise and abstract way of specifying the informational and time characteristics of a
data processing Data processing is the collection and manipulation of digital data to produce meaningful information. Data processing is a form of ''information processing'', which is the modification (processing) of information in any manner detectable by an ...
problem". They wanted to create "a notation that should enable the analyst to organize the problem around any piece of hardware". Their work was the first effort to create an abstract specification and invariant basis for designing different alternative implementations using different hardware components. The next step in IS modeling was taken by
CODASYL CODASYL, the Conference/Committee on Data Systems Languages, was a consortium formed in 1959 to guide the development of a standard programming language that could be used on many computers. This effort led to the development of the programming l ...
, an IT industry consortium formed in 1959, who essentially aimed at the same thing as Young and Kent: the development of "a proper structure for machine-independent problem definition language, at the system level of data processing". This led to the development of a specific IS
information algebra The term "information algebra" refers to mathematical techniques of information processing. Classical information theory goes back to Claude Shannon. It is a theory of information transmission, looking at communication and storage. However, it has ...
. In the 1960s data modeling gained more significance with the initiation of the
management information system A management information system (MIS) is an information system used for decision-making, and for the coordination, control, analysis, and visualization of information in an organization. The study of the management information systems involves peo ...
(MIS) concept. According to Leondes (2002), "during that time, the information system provided the data and information for management purposes. The first generation
database system In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
, called
Integrated Data Store Integrated Data Store (IDS) was an early network database management system largely used by industry, known for its high performance. IDS became the basis for the CODASYL Data Base Task Group standards. IDS was designed in the 1960s at the ...
(IDS), was designed by
Charles Bachman Charles William Bachman III (December 11, 1924 – July 13, 2017) was an American computer scientist, who spent his entire career as an industrial researcher, developer, and manager rather than in academia. He was particularly known for his ...
at General Electric. Two famous database models, the
network data model The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, ...
and the hierarchical data model, were proposed during this period of time". Towards the end of the 1960s,
Edgar F. Codd Edgar Frank "Ted" Codd (19 August 1923 – 18 April 2003) was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases and relational databa ...
worked out his theories of data arrangement, and proposed the relational model for database management based on
first-order predicate logic First-order logic—also known as predicate logic, quantificational logic, and first-order predicate calculus—is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantifie ...
. In the 1970s entity relationship modeling emerged as a new type of conceptual data modeling, originally formalized in 1976 by
Peter Chen Peter Pin-Shan Chen (; born 3 January 1947) is a Taiwanese American computer scientist. He is a (retired) distinguished career scientist and faculty member at Carnegie Mellon University and Distinguished Chair Professor Emeritus at LSU. He is ...
. Entity-relationship models were being used in the first stage of
information system An information system (IS) is a formal, sociotechnical, organizational system designed to collect, process, store, and distribute information. From a sociotechnical perspective, information systems are composed by four components: task, people ...
design during the
requirements analysis In systems engineering and software engineering, requirements analysis focuses on the tasks that determine the needs or conditions to meet the new or altered product or project, taking account of the possibly conflicting requirements of the ...
to describe information needs or the type of
information Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random ...
that is to be stored in a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
. This technique can describe any
ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities exis ...
, i.e., an overview and classification of concepts and their relationships, for a certain
area of interest Area is the quantity that expresses the extent of a region on the plane or on a curved surface. The area of a plane region or ''plane area'' refers to the area of a shape or planar lamina, while ''surface area'' refers to the area of an open s ...
. In the 1970s
G.M. Nijssen Gerardus Maria "Sjir" Nijssen (born 18 October 1938, Schinnen) is a Dutch computer scientist, former professor of computer science at the University of Queensland,''Australian Computer Journal,'' Vol. 19-20, 1987, p. 75. consultant, and author. ...
developed "Natural Language Information Analysis Method" (NIAM) method, and developed this in the 1980s in cooperation with
Terry Halpin Terry is a unisex given name, derived from French Thierry and Theodoric. It can also be used as a diminutive nickname for the names Teresa or Theresa (feminine) or Terence or Terrier (masculine). People Male * Terry Albritton (1955–2005), A ...
into
Object-Role Modeling Object-role modeling (ORM) is used to model the semantics of a universe of discourse. ORM is often used for data modeling and software engineering. An object-role model uses graphical symbols that are based on first order predicate logic and se ...
(ORM). However, it was Terry Halpin's 1989 PhD thesis that created the formal foundation on which Object-Role Modeling is based. Bill Kent, in his 1978 book ''Data and Reality,'' compared a data model to a map of a territory, emphasizing that in the real world, "highways are not painted red, rivers don't have county lines running down the middle, and you can't see contour lines on a mountain". In contrast to other researchers who tried to create models that were mathematically clean and elegant, Kent emphasized the essential messiness of the real world, and the task of the data modeler to create order out of chaos without excessively distorting the truth. In the 1980s, according to Jan L. Harrington (2000), "the development of the
object-oriented Object-oriented programming (OOP) is a programming paradigm based on the concept of " objects", which can contain data and code. The data is in the form of fields (often known as attributes or ''properties''), and the code is in the form of p ...
paradigm brought about a fundamental change in the way we look at data and the procedures that operate on data. Traditionally, data and procedures have been stored separately: the data and their relationship in a database, the procedures in an application program. Object orientation, however, combined an entity's procedure with its data."Jan L. Harrington (2000). ''Object-oriented Database Design Clearly Explained''. p.4 During the early 1990s, three Dutch mathematicians Guido Bakema, Harm van der Lek, and JanPieter Zwart, continued the development on the work of
G.M. Nijssen Gerardus Maria "Sjir" Nijssen (born 18 October 1938, Schinnen) is a Dutch computer scientist, former professor of computer science at the University of Queensland,''Australian Computer Journal,'' Vol. 19-20, 1987, p. 75. consultant, and author. ...
. They focused more on the communication part of the semantics. In 1997 they formalized the method Fully Communication Oriented Information Modeling
FCO-IM Fully Communication Oriented Information Modeling (FCO-IM) is a method for building conceptual information models. Such models can then be automatically transformed into entity-relationship models (ERM), Unified Modeling Language (UML), relation ...
.


Types


Database model

A database model is a specification describing how a database is structured and used. Several such models have been suggested. Common models include: ; Flat model : This may not strictly qualify as a data model. The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. ;
Hierarchical model A hierarchical database model is a data model in which the data are organized into a tree-like structure. The data are stored as records which are connected to one another through links. A record is a collection of fields, with each field containin ...
: The hierarchical model is similar to the network model except that links in the hierarchical model form a tree structure, while the network model allows arbitrary graph. ;
Network model The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, ...
: This model organizes data using two fundamental constructs, called records and sets. Records contain fields, and sets define one-to-many relationships between records: one owner, many members. The network data model is an abstraction of the design concept used in the implementation of databases. ; Relational model : is a database model based on first-order predicate logic. Its core idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values. The power of the relational data model lies in its mathematical foundations and a simple user-level paradigm. ; Object-relational model : Similar to a relational database model, but objects, classes, and inheritance are directly supported in database schemas and in the query language. ;
Object-role modeling Object-role modeling (ORM) is used to model the semantics of a universe of discourse. ORM is often used for data modeling and software engineering. An object-role model uses graphical symbols that are based on first order predicate logic and se ...
: A method of data modeling that has been defined as "attribute free", and "fact-based". The result is a verifiably correct system, from which other common artifacts, such as ERD, UML, and semantic models may be derived. Associations between data objects are described during the database design procedure, such that normalization is an inevitable result of the process. ;
Star schema In computing, the star schema is the simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts. The star schema consists of one or more fact tables referencing any number of dim ...
: The simplest style of data warehouse schema. The star schema consists of a few "fact tables" (possibly only one, justifying the name) referencing any number of "dimension tables". The star schema is considered an important special case of the
snowflake schema In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which ...
. Image:FigFileConvert000a.svg, Flat model Image:Hierarchisches Datenbankmodell.svg,
Hierarchical model A hierarchical database model is a data model in which the data are organized into a tree-like structure. The data are stored as records which are connected to one another through links. A record is a collection of fields, with each field containin ...
Image:Network DB model.svg,
Network model The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, ...
Image:Relational model concepts.png, Relational model Image:Company_codm.png, Concept-oriented model Image:Star-schema.png,
Star schema In computing, the star schema is the simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts. The star schema consists of one or more fact tables referencing any number of dim ...


Data structure diagram

A data structure diagram (DSD) is a
diagram A diagram is a symbolic representation of information using visualization techniques. Diagrams have been used since prehistoric times on walls of caves, but became more prevalent during the Enlightenment. Sometimes, the technique uses a three- ...
and data model used to describe conceptual data models by providing graphical notations which document
entities An entity is something that exists as itself, as a subject or as an object, actually or potentially, concretely or abstractly, physically or not. It need not be of material existence. In particular, abstractions and legal fictions are usually ...
and their relationships, and the constraints that bind them. The basic graphic elements of DSDs are
box A box (plural: boxes) is a container used for the storage or transportation of its contents. Most boxes have flat, parallel, rectangular sides. Boxes can be very small (like a matchbox) or very large (like a shipping box for furniture), and can ...
es, representing entities, and
arrow An arrow is a fin-stabilized projectile launched by a bow. A typical arrow usually consists of a long, stiff, straight shaft with a weighty (and usually sharp and pointed) arrowhead attached to the front end, multiple fin-like stabilizers c ...
s, representing relationships. Data structure diagrams are most useful for documenting complex data entities. Data structure diagrams are an extension of the entity-relationship model (ER model). In DSDs,
attribute Attribute may refer to: * Attribute (philosophy), an extrinsic property of an object * Attribute (research), a characteristic of an object * Grammatical modifier, in natural languages * Attribute (computing), a specification that defines a prope ...
s are specified inside the entity boxes rather than outside of them, while relationships are drawn as boxes composed of attributes which specify the constraints that bind entities together. DSDs differ from the ER model in that the ER model focuses on the relationships between different entities, whereas DSDs focus on the relationships of the elements within an entity and enable users to fully see the links and relationships between each entity. There are several styles for representing data structure diagrams, with the notable difference in the manner of defining
cardinality In mathematics, the cardinality of a set is a measure of the number of elements of the set. For example, the set A = \ contains 3 elements, and therefore A has a cardinality of 3. Beginning in the late 19th century, this concept was generalized ...
. The choices are between arrow heads, inverted arrow heads ( crow's feet), or numerical representation of the cardinality.


Entity-relationship model

An entity-relationship model (ERM), sometimes referred to as an entity-relationship diagram (ERD), could be used to represent an abstract
conceptual data model A conceptual schema is a high-level description of informational needs underlying the design of a database. It typically includes only the main concepts and the main relationships among them. Typically this is a first-cut model, with insufficient ...
(or
semantic data model Semantic data model (SDM) is a high-level semantics-based database description and structuring formalism (database model) for databases. This database model is designed to capture more of the meaning of an application environment than is possibl ...
or physical data model) used in
software engineering Software engineering is a systematic engineering approach to software development. A software engineer is a person who applies the principles of software engineering to design, develop, maintain, test, and evaluate computer software. The term '' ...
to represent structured data. There are several notations used for ERMs. Like DSD's,
attribute Attribute may refer to: * Attribute (philosophy), an extrinsic property of an object * Attribute (research), a characteristic of an object * Grammatical modifier, in natural languages * Attribute (computing), a specification that defines a prope ...
s are specified inside the entity boxes rather than outside of them, while relationships are drawn as lines, with the relationship constraints as descriptions on the line. The E-R model, while robust, can become visually cumbersome when representing entities with several attributes. There are several styles for representing data structure diagrams, with a notable difference in the manner of defining cardinality. The choices are between arrow heads, inverted arrow heads (crow's feet), or numerical representation of the cardinality.


Geographic data model

A data model in
Geographic information system A geographic information system (GIS) is a type of database containing Geographic data and information, geographic data (that is, descriptions of phenomena for which location is relevant), combined with Geographic information system software, sof ...
s is a mathematical construct for representing geographic objects or surfaces as data. For example, * the
vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...
data model represents geography as points, lines, and polygons *the raster data model represents geography as cell matrixes that store numeric values; * and the
Triangulated irregular network In computer graphics, a triangulated irregular network (TIN) is a representation of a continuous surface consisting entirely of triangular facets (a triangle mesh), used mainly as Discrete Global Grid in primary elevation modeling. The vertic ...
(TIN) data model represents geography as sets of contiguous, nonoverlapping triangles. Image:Groups relate to the process of making a map.jpg, Groups relate to process of making a mapDavid R. Soller1 and Thomas M. Berg (2003)
The National Geologic Map Database Project: Overview and Progress
U.S. Geological Survey Open-File Report 03–471.
Image:NGMDB data model application.jpg, NGMDB data model applications Image:NGMDB databases linked together.jpg, NGMDB databases linked together Image:Representing three-dimensional map information.jpg, Representing 3D map information


Generic data model

Generic data models are generalizations of conventional data models. They define standardized general relation types, together with the kinds of things that may be related by such a relation type. Generic data models are developed as an approach to solving some shortcomings of conventional data models. For example, different modelers usually produce different conventional data models of the same domain. This can lead to difficulty in bringing the models of different people together and is an obstacle for data exchange and data integration. Invariably, however, this difference is attributable to different levels of abstraction in the models and differences in the kinds of facts that can be instantiated (the semantic expression capabilities of the models). The modelers need to communicate and agree on certain elements that are to be rendered more concretely, in order to make the differences less significant.


Semantic data model

A semantic data model in software engineering is a technique to define the meaning of data within the context of its interrelationships with other data. A semantic data model is an abstraction that defines how the stored symbols relate to the real world. A semantic data model is sometimes called a
conceptual data model A conceptual schema is a high-level description of informational needs underlying the design of a database. It typically includes only the main concepts and the main relationships among them. Typically this is a first-cut model, with insufficient ...
. The logical data structure of a database management system (DBMS), whether
hierarchical A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an important ...
,
network Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...
, or relational, cannot totally satisfy the requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. Therefore, the need to define data from a conceptual view has led to the development of semantic data modeling techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in the figure. The real world, in terms of resources, ideas, events, etc., are symbolically defined within physical data stores. A semantic data model is an abstraction that defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world.


Topics


Data architecture

Data architecture is the design of data for use in defining the target state and the subsequent planning needed to hit the target state. It is usually one of several
architecture domain An architecture domain in enterprise architecture is a broad view of an enterprise or system. It is a partial representation of a whole system that addresses several concerns of several stakeholders. It is a description that hides other views or ...
s that form the pillars of an enterprise architecture or
solution architecture Solution architecture, term used in information technology with various definitions such as; "A description of a discrete and focused business operation or activity and how IS/ IT supports that operation". Definitions The Open Group's definition ...
. A data architecture describes the data structures used by a business and/or its applications. There are descriptions of data in storage and data in motion; descriptions of data stores, data groups, and data items; and mappings of those data artifacts to data qualities, applications, locations, etc. Essential to realizing the target state, Data architecture describes how data is processed, stored, and utilized in a given system. It provides criteria for data processing operations that make it possible to design data flows and also control the flow of data in the system.


Data modeling

Data modeling in
software engineering Software engineering is a systematic engineering approach to software development. A software engineer is a person who applies the principles of software engineering to design, develop, maintain, test, and evaluate computer software. The term '' ...
is the process of creating a data model by applying formal data model descriptions using data modeling techniques. Data modeling is a technique for defining business
requirement In product development and process optimization, a requirement is a singular documented physical or functional need that a particular design, product or process aims to satisfy. It is commonly used in a formal sense in engineering design, includ ...
s for a database. It is sometimes called ''database modeling'' because a data model is eventually implemented in a database.
Whitten, Jeffrey L. Jeffrey L. Whitten (born ) is an American computer scientist, and professor of information technology at Purdue University, known with Kevin C. Dittman and Lonnie D. Bentley as co-author of the textbook ''Systems Analysis and Design Methods'', whi ...
;
Lonnie D. Bentley Lonnie D. Bentley (born 1957) is an American computer scientist, and Professor and former Department Head of Computer and Information Technology at Purdue University, known with Kevin C. Dittman and Jeffrey L. Whitten as co-author of the textbook ...
, Kevin C. Dittman. (2004). ''Systems Analysis and Design Methods''. 6th edition. .
The figure illustrates the way data models are developed and used today. A
conceptual data model A conceptual schema is a high-level description of informational needs underlying the design of a database. It typically includes only the main concepts and the main relationships among them. Typically this is a first-cut model, with insufficient ...
is developed based on the data requirements for the application that is being developed, perhaps in the context of an activity model. The data model will normally consist of entity types, attributes, relationships, integrity rules, and the definitions of those objects. This is then used as the start point for interface or database design.


Data properties

Some important properties of data for which requirements need to be met are: * definition-related properties ** ''relevance'': the usefulness of the data in the context of your business. ** ''clarity'': the availability of a clear and shared definition for the data. ** ''consistency'': the compatibility of the same type of data from different sources. * content-related properties ** ''timeliness'': the availability of data at the time required and how up-to-date that data is. ** ''accuracy'': how close to the truth the data is. * properties related to both definition and content ** ''completeness'': how much of the required data is available. ** ''accessibility'': where, how, and to whom the data is available or not available (e.g. security). ** ''cost'': the cost incurred in obtaining the data, and making it available for use.


Data organization

Another kind of data model describes how to organize data using a database management system or other data management technology. It describes, for example, relational tables and columns or object-oriented classes and attributes. Such a data model is sometimes referred to as the ''
physical data model A physical data model (or database design) is a representation of a data design as implemented, or intended to be implemented, in a database management system. In the lifecycle of a project it typically derives from a logical data model, thou ...
'', but in the original ANSI three schema architecture, it is called "logical". In that architecture, the physical model describes the storage media (cylinders, tracks, and tablespaces). Ideally, this model is derived from the more conceptual data model described above. It may differ, however, to account for constraints like processing capacity and usage patterns. While ''data analysis'' is a common term for data modeling, the activity actually has more in common with the ideas and methods of '' synthesis'' (inferring general concepts from particular instances) than it does with ''
analysis Analysis ( : analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (38 ...
'' (identifying component concepts from more general ones). Data modeling strives to bring the data structures of interest together into a cohesive, inseparable, whole by eliminating unnecessary data redundancies and by relating data structures with relationships. A different approach is to use
adaptive system An adaptive system is a set of interacting or interdependent entities, real or abstract, forming an integrated whole that together are able to respond to environmental changes or changes in the interacting parts, in a way analogous to either conti ...
s such as
artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...
s that can autonomously create implicit models of data.


Data structure

A data structure is a way of storing data in a computer so that it can be used efficiently. It is an organization of mathematical and logical concepts of data. Often a carefully chosen data structure will allow the most efficient
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
to be used. The choice of the data structure often begins from the choice of an
abstract data type In computer science, an abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior (semantics) from the point of view of a ''user'', of the data, specifically in terms of possible values, pos ...
. A data model describes the structure of the data within a given domain and, by implication, the underlying structure of that domain itself. This means that a data model in fact specifies a dedicated ''grammar'' for a dedicated artificial language for that domain. A data model represents classes of entities (kinds of things) about which a company wishes to hold information, the attributes of that information, and relationships among those entities and (often implicit) relationships among those attributes. The model describes the organization of the data to some extent irrespective of how data might be represented in a computer system. The entities represented by a data model can be the tangible entities, but models that include such concrete entity classes tend to change over time. Robust data models often identify
abstraction Abstraction in its main sense is a conceptual process wherein general rules and concepts are derived from the usage and classification of specific examples, literal ("real" or "concrete") signifiers, first principles, or other methods. "An abstr ...
s of such entities. For example, a data model might include an entity class called "Person", representing all the people who interact with an organization. Such an
abstract entity In metaphysics, the distinction between abstract and concrete refers to a divide between two types of entities. Many philosophers hold that this difference has fundamental metaphysical significance. Examples of concrete objects include plants, h ...
class is typically more appropriate than ones called "Vendor" or "Employee", which identify specific roles played by those people. Image:Array of array storage.svg, Array Image:HASHTB08 en.svg,
Hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
Image:Singly linked list insert after.png, Linked list Image:Data stack.svg, Stack (data structure)


Data model theory

The term data model can have two meanings:Beynon-Davies P. (2004). Database Systems 3rd Edition. Palgrave, Basingstoke, UK. # A data model ''theory'', i.e. a formal description of how data may be structured and accessed. # A data model ''instance'', i.e. applying a data model ''theory'' to create a practical data model ''instance'' for some particular application. A data model theory has three main components: * The structural part: a collection of data structures which are used to create databases representing the entities or objects modeled by the database. * The integrity part: a collection of rules governing the constraints placed on these data structures to ensure structural integrity. * The manipulation part: a collection of operators which can be applied to the data structures, to update and query the data contained in the database. For example, in the relational model, the structural part is based on a modified concept of the mathematical relation; the integrity part is expressed in
first-order logic First-order logic—also known as predicate logic, quantificational logic, and first-order predicate calculus—is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantifie ...
and the manipulation part is expressed using the
relational algebra In database theory, relational algebra is a theory that uses algebraic structures with a well-founded semantics for modeling data, and defining queries on it. The theory was introduced by Edgar F. Codd. The main application of relational algebr ...
,
tuple calculus Tuple calculus is a calculus that was created and introduced by Edgar F. Codd as part of the relational model, in order to provide a declarative database-query language for data manipulation in this data model. It formed the inspiration for the d ...
and
domain calculus In computer science, domain relational calculus (DRC) is a calculus that was introduced by Michel Lacroix and Alain Pirotte as a declarative database query language for the relational data model.Michel Lacroix, Alain PirotteDomain-Oriented Relati ...
. A data model instance is created by applying a data model theory. This is typically done to solve some business enterprise requirement. Business requirements are normally captured by a semantic
logical data model A logical data model or logical schema is a data model of a specific problem domain expressed independently of a particular database management product or storage technology (physical data model) but in terms of data structures such as relational ta ...
. This is transformed into a physical data model instance from which is generated a physical database. For example, a data modeler may use a data modeling tool to create an entity-relationship model of the corporate data repository of some business enterprise. This model is transformed into a relational model, which in turn generates a
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
.


Patterns

Patterns"The Data Model Resource Book: Universal Patterns for Data Modeling" Len Silverstone & Paul Agnew (2008). are common data modeling structures that occur in many data models.


Related models


Data-flow diagram

A data-flow diagram (DFD) is a graphical representation of the "flow" of data through an
information system An information system (IS) is a formal, sociotechnical, organizational system designed to collect, process, store, and distribute information. From a sociotechnical perspective, information systems are composed by four components: task, people ...
. It differs from the flowchart as it shows the ''data'' flow instead of the ''control'' flow of the program. A data-flow diagram can also be used for the
visualization Visualization or visualisation may refer to: *Visualization (graphics), the physical or imagining creation of images, diagrams, or animations to communicate a message * Data visualization, the graphic representation of data * Information visualiz ...
of
data processing Data processing is the collection and manipulation of digital data to produce meaningful information. Data processing is a form of ''information processing'', which is the modification (processing) of information in any manner detectable by an ...
(structured design). Data-flow diagrams were invented by
Larry Constantine Larry LeRoy Constantine (born 1943) is an American software engineer, professor in the Center for Exact Sciences and Engineering at the University of Madeira Portugal, and considered one of the pioneers of computing. He has contributed numerous ...
, the original developer of structured design, based on Martin and Estrin's "data-flow graph" model of computation. It is common practice to draw a context-level data-flow diagram first which shows the interaction between the system and outside entities. The DFD is designed to show how a system is divided into smaller portions and to highlight the flow of data between those parts. This context-level data-flow diagram is then "exploded" to show more detail of the system being modeled


Information model

An Information model is not a type of data model, but more or less an alternative model. Within the field of software engineering, both a data model and an information model can be abstract, formal representations of entity types that include their properties, relationships and the operations that can be performed on them. The entity types in the model may be kinds of real-world objects, such as devices in a network, or they may themselves be abstract, such as for the entities used in a billing system. Typically, they are used to model a constrained domain that can be described by a closed set of entity types, properties, relationships and operations. According to Lee (1999) an information model is a representation of concepts, relationships, constraints, rules, and operations to specify data semantics for a chosen domain of discourse. It can provide sharable, stable, and organized structure of information requirements for the domain context.Y. Tina Lee (1999)
"Information modeling from design to implementation"
National Institute of Standards and Technology.
More in general the term ''information model'' is used for models of individual things, such as facilities, buildings, process plants, etc. In those cases the concept is specialised to Facility Information Model,
Building Information Model Building information modeling (BIM) is a process supported by various tools, technologies and contracts involving the generation and management of digital representations of physical and functional characteristics of places. Building informatio ...
, Plant Information Model, etc. Such an information model is an integration of a model of the facility with the data and documents about the facility. An information model provides formalism to the description of a problem domain without constraining how that description is mapped to an actual implementation in software. There may be many mappings of the information model. Such mappings are called data models, irrespective of whether they are
object model In computing, object model has two related but distinct meanings: # The properties of objects in general in a specific computer programming language, technology, notation or methodology that uses them. Examples are the object models of ''Java'', ...
s (e.g. using
UML The Unified Modeling Language (UML) is a general-purpose, developmental modeling language in the field of software engineering that is intended to provide a standard way to visualize the design of a system. The creation of UML was originally m ...
), entity relationship models or
XML schema An XML schema is a description of a type of Extensible Markup Language, XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed ...
s.


Object model

An object model in computer science is a collection of objects or classes through which a program can examine and manipulate some specific parts of its world. In other words, the object-oriented interface to some service or system. Such an interface is said to be the ''object model of'' the represented service or system. For example, the Document Object Model (DOM)br>
is a collection of objects that represent a web page, page in a
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
, used by
script Script may refer to: Writing systems * Script, a distinctive writing system, based on a repertoire of specific elements or symbols, or that repertoire * Script (styles of handwriting) ** Script typeface, a typeface with characteristics of handw ...
programs to examine and dynamically change the page. There is a
Microsoft Excel Microsoft Excel is a spreadsheet developed by Microsoft for Microsoft Windows, Windows, macOS, Android (operating system), Android and iOS. It features calculation or computation capabilities, graphing tools, pivot tables, and a macro (comp ...
object model for controlling Microsoft Excel from another program, and the ASCOM Telescope Driver is an object model for controlling an astronomical telescope. In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
the term ''object model'' has a distinct second meaning of the general properties of
objects Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an ...
in a specific computer
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
, technology, notation or
methodology In its most common sense, methodology is the study of research methods. However, the term can also refer to the methods themselves or to the philosophical discussion of associated background assumptions. A method is a structured procedure for bri ...
that uses them. For example, the ''
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
object model'', the '' COM object model'', or ''the object model of OMT''. Such object models are usually defined using concepts such as
class Class or The Class may refer to: Common uses not otherwise categorized * Class (biology), a taxonomic rank * Class (knowledge representation), a collection of individuals or objects * Class (philosophy), an analytical concept used differentl ...
,
message A message is a discrete unit of communication intended by the source for consumption by some recipient or group of recipients. A message may be delivered by various means, including courier, telegraphy, carrier pigeon and electronic bus. A ...
,
inheritance Inheritance is the practice of receiving private property, Title (property), titles, debts, entitlements, Privilege (law), privileges, rights, and Law of obligations, obligations upon the death of an individual. The rules of inheritance differ ...
, polymorphism, and encapsulation. There is an extensive literature on formalized object models as a subset of the
formal semantics of programming languages In programming language theory, semantics is the rigorous mathematical study of the meaning of programming languages. Semantics assigns computational meaning to valid strings in a programming language syntax. Semantics describes the processes ...
.


Object-Role Model

Object-Role Modeling (ORM) is a method for
conceptual modeling A conceptual model is a representation of a system. It consists of concepts used to help people knowledge, know, understanding, understand, or simulation, simulate a subject the model represents. In contrast, physical models are physical object su ...
, and can be used as a tool for information and rules analysis.Joachim Rossberg and Rickard Redler (2005). ''Pro Scalable .NET 2.0 Application Designs.''. Page 27 Object-Role Modeling is a fact-oriented method for performing
systems analysis Systems analysis is "the process of studying a procedure or business to identify its goal and purposes and create systems and procedures that will efficiently achieve them". Another view sees system analysis as a problem-solving technique that ...
at the conceptual level. The quality of a database application depends critically on its design. To help ensure correctness, clarity, adaptability and productivity, information systems are best specified first at the conceptual level, using concepts and language that people can readily understand. The conceptual design may include data, process and behavioral perspectives, and the actual DBMS used to implement the design might be based on one of many logical data models (relational, hierarchic, network, object-oriented, etc.).Object Role Modeling: An Overview (msdn.microsoft.com)
Retrieved 19 September 2008.


Unified Modeling Language models

The Unified Modeling Language (UML) is a standardized general-purpose
modeling language A modeling language is any artificial language that can be used to express information or knowledge or systems in a structure that is defined by a consistent set of rules. The rules are used for interpretation of the meaning of components in th ...
in the field of
software engineering Software engineering is a systematic engineering approach to software development. A software engineer is a person who applies the principles of software engineering to design, develop, maintain, test, and evaluate computer software. The term '' ...
. It is a graphical language for visualizing, specifying, constructing, and documenting the artifacts of a software-intensive system. The Unified Modeling Language offers a standard way to write a system's blueprints, including:Grady Booch, Ivar Jacobson & Jim Rumbaugh (2005
OMG Unified Modeling Language Specification
* Conceptual things such as
business process A business process, business method or business function is a collection of related, structured activities or tasks by people or equipment in which a specific sequence produces a service or product (serves a particular business goal) for a parti ...
es and system functions * Concrete things such as
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
statements, database schemas, and * Reusable
software components Component-based software engineering (CBSE), also called component-based development (CBD), is a branch of software engineering that emphasizes the separation of concerns with respect to the wide-ranging functionality available throughout a give ...
. UML offers a mix of
functional model In systems engineering, software engineering, and computer science, a function model or functional model is a structured representation of the functions ( activities, actions, processes, operations) within the modeled system or subject area.
s, data models, and database models.


See also

*
Business process model Business process modeling (BPM) in business process management and systems engineering is the activity of representing processes of an enterprise, so that the current business processes may be analyzed, improved, and automated. BPM is typically p ...
*
Core architecture data model Core architecture data model (CADM) in enterprise architecture is a logical data model of information used to describe and build architectures.Common data model A common data model (CDM) can refer to any standardised data model which allows for data and information exchange between different applications and data sources. Common data models aim to standardise logical infrastructure so that related applicat ...
, any standardised data model *
Data collection system Data collection system (DCS) is a computer application that facilitates the process of data collection, allowing specific, structured information to be gathered in a systematic fashion, subsequently enabling data analysis to be performed on the in ...
*
Data dictionary A data dictionary, or metadata repository, as defined in the ''IBM Dictionary of Computing'', is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format". ''Oracle'' defines it ...
*
Data Format Description Language Data Format Description Language (DFDL, often pronounced ''daff-o-dil''), published as an Open Grid Forum Recommendation in February 2021, is a modeling language for describing general text and binary data in a standard way. A DFDL model or schema ...
(DFDL) *
Distributional–relational database A distributional–relational database, or word-vector database, is a Database Management System, database management system (DBMS) that uses distributional word embedding, word-vector representations to enrich the semantics of data model, structur ...
*
JC3IEDM JC3IEDM, or Joint Consultation, Command and Control Information Exchange Data Model is a model that, when implemented, aims to enable the interoperability of systems and projects required to share Command and control, Command and Control (C2) infor ...
*
Process model The term process model is used in various contexts. For example, in business process modeling the enterprise process model is often referred to as the ''business process model''. Overview Process models are processes of the same nature that a ...


References


Further reading

* David C. Hay (1996).
Data Model Patterns: Conventions of Thought
'. New York:Dorset House Publishers, Inc. * Len Silverston (2001). ''The Data Model Resource Book'' Volume 1/2. John Wiley & Sons. * Len Silverston & Paul Agnew (2008). ''The Data Model Resource Book: Universal Patterns for data Modeling'' Volume 3. John Wiley & Sons. * Matthew West and Julian Fowler (1999).
Developing High Quality Data Models
'' The European Process Industries STEP Technical Liaison Executive (EPISTLE). * Matthew West (2011)
Developing High Quality Data Models
' Morgan Kaufmann {{DEFAULTSORT:Data Model